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Abstract 

For general exact repair regenerating codes, the optimal trade-offs between storage size and repair 
bandwith remain undetermined. Various outer bounds and partial results have been proposed. Using 
a simple chain rule argument we identify nonnegative differences between the functional repair and the 
exact repair outer bounds. One of the differences is then bounded from below by the repair data of 
a shortened subcode. Our main result is a new outer bound for an exact repair regenerating code in 
terms of its shortened subcodes. In general the new outer bound is implicit and depends on the choice 
of shortened subcodes. For the linear case we obtain explicit bounds. 


1 Introduction 

Regenerating codes were introduced by Dimakis, Godfrey, Wu, Wainwright and Ramchandran [2]. Their 
main application is in large distributed storage systems where they lead to significant savings by optimizing 
the trade-off between storage size and repair bandwith. In a distributed storage system (DSS) data is stored 
at N nodes such that it can be recovered from any combination of k nodes. If a node fails it can be rebuilt 
by retrieving the information needed for its repair from any combination of d other nodes. An encoding 
scheme realizing these parameters is called an (N,k,d) regenerating code. 

An (TV, k, d) code comes with a secondary set of parameters {B, a, /3). For data of total size B, a part of 
size at most a is stored at a single node, and bandwith between a node and any of the d nodes helping in its 
repair is limited to /3. The gains in a DSS are obtained by using a bandwith df3 for the repair of a single node 
that is possibly larger than its data size a but much smaller than the total data size B. The challenge is, 
given (TV, fc, d), to optimize the trade-off between the storage a per node and the repair bandwith /3 between 
nodes in order to store data of size B. Constructive solutions that yield lower bounds for R, or inner bounds, 
can be found in [8], [TO], [IT], |6], HU, la, HD, m- 

Without the added access and repair constraints, N nodes will be able to store data of size B = Na. The 
requirement that data can be recovered from any k nodes reduces this amount to B < ka. The requirement 
that a node can be repaired with help from any d other nodes introduces further overhead and reduces the 
size of the data. A first upper bound that takes into account both access and repair requirements is 

B < {k — £)a + P + -|- 1 — A;)/3, 0 < € < fc. (1) 

The upper bound holds for functional repair and thus for exact repair regenerating codes. In the exact repair 
scenario it is required that a damaged node be rebuilt to its original form. Functional repair uses the weaker 
assumption that a node be rebuilt to a form that preserves the functionality of the DSS. The upper bound 
(HD is attained in the functional repair scenario [2] (using arguments from network coding) but is not optimal 
for exact repair regenerating codes. This was first shown by Tian m for codes of type (n = 4, fc = 3, d = 3). 
Further results on outer bounds for exact repair are in 0,0, m, la¬ 
in this paper we present a new improved outer bound for exact repair regenerating codes. First we refine 
the proof for the outer bound (HD using a simple chain rule argument. This exposes several nonnegative error 
terms. We then focus on one particular error term and as main result we formulate an improved version 
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of the outer bound where this error term is bounded from below. Theorems 3.2 and 4.2 in [3] describe two 
other improvements of the outer bound ([T]). The arguments that are used in [4] are different from the ones 
used in this paper. In Appendix the different improvements are illustrated by three different proofs for 
the improved outer bound for the case (n = 4, fc = 3, d = 3). Before we describe the main results in more 
detail we introduce the notation. 

1.1 Notation 

We will use the entropy terminology to express the various bounds. When the random variable X corresponds 
to the drawing of a vector, uniformly at random, from a finite vector space X we have H{X) = log |A| = 
dim A, for the appropriate choice of base in the logarithm. For subspaces A, F C F, the usual dictionary 
between entropy and dimension includes the relations 

(joint entropy) H{X,Y) = dim(A + F) 

(conditional entropy) il(A|F) = dim A/(A n F) = dim(A + Y)IY 

(mutual entropy) I{X;Y) = dimAflF 

To an exact repair regenerating code of type (A, fc, d) with secondary parameters {B, a, /3) correspond random 
variables M, {Wj '■ 1 < j < n} and 'X < i,j < n,i ^ j} that satisfy several entropy constraints. 

The variable M describes the data to be stored at the N nodes and has entropy H{M) = B. The variable 
Wj is a function of M that describes the data stored at node j, and the variable Si^j is a function of Wi 
that describes the helper information provided by node i to repair node j. The entropy constraints are the 
following. 

(Storage) H(Wj) = a, H{Wj\M) = 0, H{M\Wj) = 0 ior \J\ > k. 

(Repair) = /3, H{S^^,\W,) = 0, H{Wj\Si^,) = 0 for |/| > dj ^ I. 

Here Wj denotes the joint distribution Wj = {Wj : j G J) and denotes the joint distribution Si^j = 
{Si-yj : i G I), for j ^ I. Assuming uniform distributions for each of the variables, the conditions H{M) = 
B, H{Wj) = a and H{Si^j) = fd describe the size of the underlying space for M, Wi and St^j, respectively. 
The access condition H{M\Wj) = 0 for | J| > k says that the data can be recovered from information stored 
on any k nodes, and similarly H{Wj\Si^j) = 0 for |J| > d,j ^ I says that node j can be rebuilt with helper 
information received from any d remaining nodes. 

For a linear regenerating code the above can be restated in terms of generating and parity-check matrices. 
The generator matrix is a matrix of size B x Na with B independent rows and N blocks of columns, with 
a columns in each block. The variable M corresponds to the columns space of the matrix, the variable Wj 
to the column space of the jth block of a columns, and the variable Si^j to a subspace of Wi. The access 
conditions say that the full column space M is generated by any k of the N subspaces Wj , and that Wj is 
generated by any d of the subspaces Si^j, i j. Details for the parity-check matrix of a linear regenerating 
code are in [H Section 2.1]. 

1.2 Outline and Resnlts 

To describe the main result, consider the data collection scenario in Figure 1. Data is collected from a subset 
of n nodes that are numbered 1 to n (out of a total of N nodes). For a given i with 0 < £ < n, the contents 
of nodes £ -I- 1 to n is read from the nodes, an amount of size (n — £)a. The contents of nodes £, £ — 1,..., 1 
is recovered in that order using repair information. When it is time to collect repair information for node 
j, for 1 < J < £, repair information for that node is already available from nodes j + 1, ■ ■ ■ ,n. The missing 
repair information can be collected from nodes 1,..., j — 1 and form any d+1 — n nodes that are not among 
nodes 1 to n. Thus, with B{n) = H{W[i^n]) the information content of n nodes, 

B{n) < Bi{n) := (n — £)a -I- f ^ j +l —n)j3, 0 < i < n. (2) 
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Figure 1: Data collection from n nodes. 

In Section [21 we prove a version of this bound that includes an extra error term. 
fTheorem 12.31) For 0 < £ < n, 

B(n) + A < ^ ^ 

1<z<_7<£ 

where A = Ei<i<i<^> 0. 

In Section |3l we exploit the error term to improve the upper bound. 
iTheorem 13.31) For u = 1,2,.. .,v, let Wn+u be such that 

II(S^-^J|Wn+u) < i/(S'i^j|IF[j+i_„+„_i]), for I < i < j < .^. 

Then 

B(n)+vB(n) < i?(IF[,+i.„]) + ^ 

l<i<j<'^ j<£ 

+ I + iJ(IF„+„|IF[,+!.„]) + ^ H{Wj\S[,^r.]\jWn+u) 

u—1 \ 

In Section m we give a choice Wn+u for linear regenerating codes such that H{Wn+u) = u{na — B{n)). 
With this choice the upper bound becomes 

ITheorem 14.21) For a linear regenerating code, and for u > 0, 

^2 {v + l)B(,{n)+ (j 


For n = A: + I = d+l, and for £ = n, 


n + 2\ /n + I\ (n 

2 )®-( 2 "“+(2 


This bound is Theorem l.I in [7]. It is attained by layered codes (defined in [TS]'). 
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2 Refinement of the exact repair outer bound 


For a regenerating code of length N we fix an arbitrary ordering of the N nodes and denote by B(n) 
the amount of data on the last n nodes. For a given n, we number the last n nodes from 1 to n. The 
remaining N — n nodes are numbered from 0 downwards. The outer bound ([21) is piece-wise linear of the form 
B{n) < min{i?^(n) : 0 < ^ < n}, with each of the Bi{n) a linear combination of the storage per node a and 
the helper bandwith between nodes /3. In this section we derive a version B{n) + A < min{B({n) : 0 < ^ < n} 
with an explicit error term A. For n = k, the error term A gives a lower bound for the gap between the 
functional repair and the exact repair outer bounds. 

In deriving the outer bound we will only refer to helper information Si^j for i < j. For a given j, 
1 < J < JT-, we consider the sequence Ai, A 2 , ..., Xn of n variables 


Ai = 


forz<j. 
Wi, for i > j. 


(3) 


In either of the two cases Xi is a function of the information Wi at node i. The following lemma is a 
straightforward application of the chain rule and holds for an arbitrary sequence of n random variables. 
Nonetheless it is at the basis of everything that follows. 

Lemma 2.1. For a sequence Xi ,..., A„ of n random variables, and for 1 < j < n, 


i<j i<j 


Proof. The claim says that the joint entropy is the same for the sequence 0, Ai,..., A„ and the permuted 
sequence with 0 and Xj exchanged. For a formal proof, apply the chain rule j times to 71(A[i „]) and j — 1 
times to H{X[i n^\j), 


Fr(A[i,„]) = il(A[,+ ^il(A,|A[,+i,„]), 

i<j 

H{X[i^n]\j) = H{Xy_^.i^n]) + H{Xi\X^i_^.i^n]\j)- 

i<j 

Now use il(A[i^„]) = il(A[i_„]\j) -b 77(Aj|A[i_„]\j). □ 

We apply the lemma to the sequence ©. 

Proposition 2.2. 

i<j i<j 

Proof. In Lemma l2.II we replace the second term on the left with a smaller term and the two terms on the 
right with larger terms. □ 

Using the proposition i times we obtain a refinement of the outer bound (|2|) . 

Theorem 2.3. For 0<i<n<d+l, 

B{n) + A < H{W[e+i,n])+ Y. + (4) 

l<i<j<i j<i 

< {n - £)a + fi + i{d+1 - n)/3. 
where A = J2i<i<j<e > 0 . 
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Proof. With B{n) = 


B{n) + EE 

j<l i<j 

= i?(W^[^+l.n]) + E H{WfWl, + ,,r.]) + E E 

j<e j<e i<j 

j<( i<j j<l 


For the inequality use the proposition. 


□ 


The theorem shows that for a given 0 < i < n, there is a gap in the upper bound ([2]) of size at least 


i<i<j<e 


(5) 


The terms in the sum capture that part of the helper information Si^j may be redundant. There are two 
important cases with A = 0, Minimum Storage Regenerating codes (MSR codes) and Minimum Bandwith 
Regenerating codes (MBR codes). MSR codes have I(Wj;Wj\j) = 0 for | J| = k and H{M) = ka. For MSR 
codes, the bound (I2|) is achieved for .^ = 0 and the summation for A is empty. Note that the summation for 
A is empty also when (. = \. MBR codes have I{Si^j; = 0 for |/| = d and H{Wj) = d/3. For MBR 

codes, the bound ([2]) is achieved for £ = fc and the terms in the summation for A are all zero. For values of 
i ^ {0,1, k} we obtain improvements of ([2]) from lower bounds for the gap ([5]). Our approach is to collect 
the helper information at a separate node W such that H{Si^j\W) < |tT[i+i^„]). The same chain 

rule argument of Lemma [2T] goes through if we add W as node Wn+i to the nodes ITi, W 2 ,..., Wn- This is 
worked out in the next section. 

While Theorem l2.3l focuses on A as the main gap in the upper bound three other gaps can be pointed 
out. They are due to the transition from an equality in Lemma |2.II to an inequality in Proposition 12.21 bv 
replacing three of the terms. We quantify these gaps but will not consider them further in this paper. 

= - eg) 

eg) = +!.„]) > 0 

= - eg) 

eg) =J(W,;Wb-+i,„]|5[i.„]\,) >0 

All three gaps vanish for the important class of layered codes (defined in [15)1. 
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3 Improvement of the exact repair outer bound 

Starting point for the outer bound (jd]) in Theorem 12.31 is Lemma [Q The identity 

i<j i<j 

holds for any n random variables with a common joint distribution. We applied it i times, for j < £. For 
each j < i it was used with the choice of variables 


= 


ioTi<j. 
Wi, for i > j. 


( 6 ) 


To estimate the term A given by ([S]) we apply the same bound v more times. Each time, before the bound is 
applied we add a carefully chosen term = Wn+u to the sequence Xi,X 2 , ■ ■ ■, for 1 < u < u. 

Here Wn+u is any function of M such that 


H[S,^,\Wn+u) < for 1 < z < j < T (7) 

The variables Wn+u may be identified with added virtual nodes. For given j < £, the application of Lemma 
[2T] to the extended sequence Xi, X 2 , ..., Xn-\-u yields, for 0 < u < v, 

,n+ii]) E i7(A,|A[ i+l,n+ii]\j) ~i- H{Xj\X^i n+u]\j) (8) 

i<j i<j 

In the following lemma we take the sum of these u + 1 equations. 

Lemma 3.1. Let Xi ,..., A„, Xn+i ,..., Xn+v be random variables such that 


H{X,\Xn+u) < H{Xi\X[i_f.i^n+u-i]) for 1 < i < e, 1 < u < V. 


Then, for 1 < j < £, 

V V 

Tt {Xi\X^2_^l n+v]) — 'y {Xi\X^2_^i^n]\j) T l^[l,n+u]\j) 

It—0 i<j i<j u—0 

Proof. Take the summation of (|S]) over 0 < u < v. For 1 < u < u, the inequality 


H{Xi\X\^ij^l^n+u]\j) A H{Xi\Xn+u) < H{Xi\X\^ij^i n+u-l]) 
gives a cancellation of terms in the summation. □ 

We apply the lemma with the sequence ([5]). 

Proposition 3.2. Let Wn+u, 1 < u < u, he such that 


H{Si^j\Wn+u) < H{Si^j\W[,+i^n+u-i]), for I <i < j < £. 


Then, for 1 < j < £, 


■£H{w,\w,m.nt+ + Y. ,n-\-v ]) E H{W,\S[+n]\jWn+u)- 

U—0 i<j i<j u—0 
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Proof. In the result of Lemma [3.II we replace the terms on the left with smaller terms and the terms on the 
right with larger terms. Since Xn+u = Wn+u and H{Xi\Wi) = 0 for all i < n + u, the condition in the 
proposition guarantees the condition that is needed for the lemma. For 1 < i < £, 

HiX,\Xn+u) = H{X,\Wn+u) < H{X,\W[ ])<H{X,\X[ i+l,n+u-l])- 


□ 


Theorem 3.3. For given 0 < i < n and v > 0, let Wn+u, ^ < u < v, be such that 

H{Si^j\Wn+u) < for 1 < i < j < £. 

Then 

B{n) + vB{n) < H{Si^j) + ''^^F[{Wj\S[i^n]\j)+ 

j<i 

+ I ^(W^[^+l,n]) + H{^n+u\Wyij^i^n\) + Hj\S[l^n\\jWn+u) 
u—1 y j<£. 

Proof. The proposition yields, after summation over 1 < j < £, 


u—1 

V 

< E + E + E E 

i<j u— 1 j<i 


So that 

V 

B{n) + vB{n) < E E + E + E E i5[i.„]\,iF„+„) + 

_ 7 <£ i<j u— 1 j<i 

V 

+ Lf(%+1.„]) + E i^(%+i.„]VF„+„). 

li—1 

After reordering the terms, the claim follows. □ 

Corollary 3.4. With notation and conditions as in the theorem, 

{v + l)B{n) < {v + l)Be{n) + E (^H{Wn+u) - ■ 

Remark 3.5. If we use Theorem l3.3l to obtain an upper bound for the conditional entropy 7L(lT[i_£] |VF[^+i_„]) 
we find 

{v + l)(B{n)-B{n-£)) < (?; + l)(B^(n) - (n - £)a) + E - ^ 2 )^)' 

Together with B{n — £) < {n — £)a this gives Corollary 13.41 
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4 Linear regenerating codes 

For linear regenerating codes, Condition 0 


H{Si^j\Wn+u) < for 1 < i < j < 

holds for 

fCn+u — C ld^[2+l,n+ii—1] ■ 1 ^ ^ ^ J ^ 

and thus for 

Wn+u = (Wi n VF[,+1 ,n+u-l] ‘-i ^ t). 

Lemma 4.1. For Wn-\-u cts in m, 

e 

HiWr,+u) < u(^i?(W',) + i/(iF[,+i.„])-i/(fF[i,„])) 

i=l 

< u min{£a, na — H{W [I n])) 

Proof. 


H{Wn+u) < Y.^H{W,) - H{W,\W],+ ^^n+u-^)) 

i<t 

= + H{W[i+^^n+u-l]) - H{W]^^n+u-l]) 

i<t 

< ^ H{W,) + i/(%+i.„]) + H{W[n+l,n+u-l]) - H{W]^^n]) 

i<t 

Now apply induction to complete the proof. 

We apply Corollary [331 

Theorem 4.2. For a linear regenerating code, and for n > 0, 

2 < iv + l)Btin)+ 2 

Proof. Use Corollarv l3.4l in combination with Lemma oi 


{v + l)i?(n) < 


{v + l)Bi{n) + Yi 

U—1 


^u(na — B{n)) 



( 9 ) 


□ 


□ 


For linear regenerating codes with n = k + l = d+l, and for i = n, the bound 


V + 2 
2 


B < 


n + 1 
2 


na + 




is Theorem 1.1 in [7]. 


For a regenerating code with {k = 2p,d = 3p) and (a = 2p,/3 = 1), the functional repair outer bound 
yields B < {7p^ +p)/2. The minimum in i? < mYa.{Bi{k) : 0 < £ < fc} is attained for £ = p. Corollary 3.3 in 
[4] lowers the bound for exact repair regenerating codes by {p^ — 1)/16. Theorem 14.21 with £ = n = k = 2p 
and w = 1, lowers the same bound by p^/6. 
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A Three proofs for ( 4 , 3 , 3 ) outer bounds 

Proofs 1 and 2 are based on [3]. Proof 3 follows the current paper. 


(Proof 1) 


12 3 4 


12 3 4 


12 3 4 


1 -o-- 

2 ... . 

3--*- 
4 ■ • • * 


1 * ■ ■ ■ 

2 • • o o 

3 • • ■ o 

4 ... . 


1 • • o o 

2 . * . . 

3 ... o 

4 ... . 


B < H{Wi\W3Si^2) + H[W:iSx^2) 

<H{Wi\S:i^^) + H[W:iSx^2) 

B < 77(52^3|bPl^2^453^4) + i/(lPl^2^453^4) 

< i?(52^3|fP4^3^4) + H{WiS2^a) + H{S3^i) 

B < i?(S'i^3|W'25'i_s.4<5'3_>4) + if(lP2<S'l->.4<S'3_>4) 

< i?(S'i^3|5'2->-3bl4<5'3_).4) + i/(W2<S'l->-4<S'3_>4) 

35 < H{Si^^S2^3WiS3^A) + H{W^Si^2) + 5(lPi52^4) + H{W2Si^aS3^a) 

= H{Si^3S2^3W4) + H{W3Si^ 2) + H{WiS2^4) + H {W2Si^4S3^4) 


35 < i; H(»'i)+ 

l<j<4 l<i<j<4 


(Proof 2) 


12 3 4 


12 3 4 


1-0-- 
2 ... . 

3 ■ • ★ • 

4 ... * 


1 * • • • 

2 • • o o 

3 • • • o 

4 ... . 


12 3 4 

1 . . . . 

2-*-- 
3--*- 

4 ... * 


5 < H{W3W4) + H{Si^ 2 ) 

5 < H{ 82 ^ 382 ^ 4 ) + H{WiS3^4) 

5 < H{W3W4\W2) + H{W 2 ) < H{W3W4\S2^3S2^4) + 5(1^2) 


35 < H{W3W4S2^3S2^4) + 5 ( 1 ^ 3 IP 4 ) + H{Si^ 2 ) + H{WiS3^4) + H{W 2 ) 

< H{W3W4S2^3) + H{W3W4S2^4) + H{Si^2) + H{WiS3^4) + 5(14^2) 

< H{W4Si^ 3S2^3) + HiW3Si^4S2^4) + + H{WiS 3^4) + HiW2) 


35 < E H{w,)+ Y1 

l<i<4 l<«<i<4 
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(Proof 3) 



1 

2 

3 

4 

1 

2 

3 

4 

M 

0 

0 

0 

0 

Wi 

W 2 

W 3 

W 4 

Wi 

VFi 


‘S'l->3 

^ 1^4 

0 

Sl ^2 

<S'i^3 

^1^4 

W 2 

IF2 

1F2 

<5'2->3 

S'2-».4 

1F2 

0 

82^3 

S'2->.4 

W 3 

VF3 

1F3 

IF3 

^3^4 

1F3 

1F3 

0 

^3^4 

W 4 

IF 4 

W 4 

W 4 

IF4 

1F4 

1F4 

1F4 

0 

W 5 

W 5 

1F5 

IF 5 

W 5 

W 5 

W 5 

W 5 

1F5 


Table 1: Pairs of columns with the same entropy 


Table [T] contains random variables Wi and Si^j for a regenerating code with four nodes and parameters 
(fc = 3, d = 3). The last row is obtained by adding a node W 5 whose contents will be chosen later. Columns 
with the same label 1 < j < 4 contain the same variables. For a pair of columns with the same label we 
compute the column entropy using the chain rule from the bottom to the top. The computation is done first 
for columns from row W 4 upwards and then for the extended columns from row W 5 upwards. By invoking 
the chain rule each entry in the table contributes to the entropy of its column with its entropy conditional 
on the entries below it. 


We compare the sum of the column entropies for the four columns on the left and on the right. First 
we ignore the row W 5 (or set W 5 = 0). The entries below the diagonal produce the same terms left and 
right. The four remaining entries with Wi on the left (in the diagonal positions) sum to H{WiW 2 W^W 4 ). 
The four remaining entries with Wi on the right (in the top row) all produce 0 terms (using d = 3). The 
remaining entries on the right sum to at most X]i<i<j <4 (with equality if and only if for each term the 
conditional entropy equals the actual entropy). We still have to account for the entries with Si-^j on the 
left. In each case, an entry contributes at least 7 d(S'i_>j|IF[i+i_ 4 ]). Thus 

H{WiW2W3W4)+ Y. H{S,^j\Wi,+4,4]) < Y 
We repeat the comparison but now include the constant row W 5 . 

H{W4W2W3W4m)+ Y < 51 HiS.^jlWa). 


For IF 5 such that 
we obtain 


H{S,^,\W5) < id(5.^,|W[,+i,4]) 


H{WiW2W3W4) + H{W4W2W3W4\W5) 

< HiWiW2W3W4)+ Y H{S,^,\W3) < Y 

1<2<j<4 

In the linear setting it suffices to choose for IF 5 a vector space that contains Wj fl IF[j+i_ 4 ] for j = 1,2,3. 
This results in H{W 3 ) = “ H{WiW 2 W 3 W 4 ). And, with B = H{WiW 2 W 3 W 4 ), in 

3B < E + E 

l<i<4 l<i<j<4 
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